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Description 
COPYRIGHT NOTICE 

5 [0001 ] A portion of the disclosure of this patent document contains material that is subject to copyright protection. The 
copyright owner has no objection to the xerographic reproduction by anyone of the patent document or the patent dis- 
closure in exactly the form it appears in the Patent and Trademark Office patent file or records^ but otherwise reserves 
all copyright rights whatsoever. 

10 SOFTWARE APPENDICES 

[0002] A Software Appendix of source code for an embodiment of the invention including two (2) sheets is included 
herewith. 

15 BACKGROUND OF THE INVENTION 

[0003] The present invention relates to the field of image processing. More specifically, the present invention relates 
to computer systems for aligning grids on a scanned image of a chip including hybridized nucleic acid sequences. 
[0004] Devices and computer systems for forming and using arrays of materials on a chip or substrate are known. For 

20 example, POT applications W092/10588 and 95/11995, both incorporated herein by reference for all purposes, 
describe techniques for sequencing or sequence checking nucleic adds and other n:iaterials. Arrays for performing 
these operatioris may be formed in arrays according to the methods of, for example, the pioneering techniques dis- 
closed in U.S. Patent Nos. 5,445,934. 5,384.261 and 5.571 ,639, each incorporated herein by reference for all purposes. 
[0005] According to one aspect of the techniques described therein, an array of nucleic acid probes is fabricated at 

25 known locations on a chip. A labeled nucleic acid is then brought into contact with the chip and a scanner generates an 
image file (also called a cell file) indicating the locations where the labeled nucleic acids are bound to the chip. Based 
upon the image file and identities of the probes at specific locations, it becomes possible to extract information such as 
the nucleotide or monomer sequence of DMA or RNA. Such systems have been used to form, for example, arrays of 
DNA that may be used to study and detect mutations relevant to genetic diseases, cancers, infectious diseases, HIV, 

30 and other genetic characteristics. 

[0006] The VLSIPSTM technology provides methods of making very large arrays of oligonucleotide probes on very 
small chips. See U.S. Patent No. 5,143.854 and POT patent publication Nos. WO 90/15070 and 92/10092, each of 
which is incorporated by reference for all purposes. The oligonucleotide probes on the DNA probe array are used to 
delect complementary nucleic acid sequences in a sample nucleic acid of interest (the *1arget" nucleic acid). 

35 [0007] For sequence checking applications, the chip may be tiled for a specific target nucleic acid sequence. As an 
example, the chip may contain probes that are perfectly complementary to the target sequence and probes that differ 
from the target sequence by a single base mismatch. For de novo sequencing applications, the chip may include all the 
possible probes of a specific length. The probes are tiled on a chip in rows and columns of cells, where each cell 
includes multiple copies of a particular probe. Additionally, "blank" cells may be present on the chip which do not include 

40 any probes. As the blank cells contain no probes, labeled targets should not bind specifically to the chip in this area. 
Thus, a blank cell provides a measure of the background intensity. 

[0008] In the scanned image file, a cell is typically represented by multiple pixels. Although a visual inspection of the 
scanned image file may be performed to identify the individual cells in the scanned image file. It would be desirable to 
utilize computer-implemented image processing techniques to align the scanned image file. 

45 

SUMMARY OF THE INVENTION 

[0009] EmbKXfiments of the present invention provide innovative techniques for aligning scanned images. A pattern is 
included in the scanned image so that when the image is convolved with a filter, a recognizable pattern is generated in 

50 the convolved image. The scanned image may then be aligned according to the position of the recognizable pattern in 
the convolved image. The filter may also act to remove or "filter out" the portions of the scanned image tiiat do not cor- 
respond to the pattern in the scanned image. Several embodiments of tiie invention are described below. 
[0010] In one embodiment, the invention provides a computer-implemented method of aligning scanned images. The 
scanned image is convolved with a filter. The scanned image includes a first pattern that the fitter will convolve into a 

55 second pattern in the convolved image. The scanned image is then aligned according to the position of )3ne second pat- 
tern in the convolved image. In a preferred embodiment, the first pattern may be a checkerboard pattern that is con- 
volved into a grid pattern in the convolved image. 

[001 1 ] In another emtxxiiment. the invention provides a method of aligning scanned images of chips with hybridized 
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nucleic sequences. A chip having attached nucleic acid sequences (probes) is synthesized, with the chip including a 
first pattern of nucleic acid sequences. Labeled nucleic acid sequences are hybridized to nucleic acid sequences on the 
chip and the hybridized chip is scanned to produce a scanned image. The scanned image is convolved with a filter that 
will convolve the first pattern into a second pattern in the convolved image. The scanned image is then aligned accord- 
5 ing to the position of the second pattern in the convolved image. In a preferred embodiment, the first pattern may be a 
checkerboard pattern that is generated by control nucleic acid sequences that hybridize to alternating squares in the 
checkerboard pattern. 

[0012] Other features and advantages of the invention will become readily apparent upon review of the following 
detailed description in association with the accompanying drawings. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0013] 

15 Fig. 1 illustrates an example of a computer system that may be utilized to execute the software of an embodiment 
of the invention. 

Fig. 2 illustrates a system block diagram of the computer system of Fig. 1 . 

Fig. 3 illustrates an overall system for forming and analyzing arrays of biological materials such as DNA or RNA. 

Fig. 4 is a high level flowchart of a process of synthesizing a chip. 
20 Fig. 5 illustrates conceptually the binding of probes on chips. 

Fig. 6 illustrates a flowchart of how a chip is hybridized and analyzed to produce experimental results. 

Fig. 7A shows a checkerboard pattern in a scanned image and Fig. 7B shows a grid that has been aligned over the 

scanned image to show the individual cells on the chip. 

Fig. 8 illustrates a flowchart of a process of image alignment. 
25 Fig. 9A shows a checkerboard pattern in a scanned image and Fig. 9B shows a convolved image of Fig. 9A with a 

grid pattern that was generated by the checkerboard pattern. 

Fig. 10 illustrates a flowchart of a process of convolving the scanned image. 

Fig. 1 1 shows neighbor pixels that may be analyzed to produce a convolved pixel in the convolved image. 
Figs. 12A-12D show how the filter may be moved over the scanned intiage to produce the convolved image. 
30 Fig. 13 illustrates a flowchart of a process of refining the grid alignment over the scanned image. 

Fig. 14 shows the grid lines in the scanned image that may be analyzed to refine the grid alignment. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

35 Overview 

[0014] In the description that follows, the present invention will be described in reference to preferred embodiments 
that utilize VLSIPSTM technology for making very large arrays of oligonucleotide probes on chips. However, the inven- 
tion is not limited to images produced in this fashion and may be advantageously applied other hybridization technolo- 
40 gies or images in other technology areas. Therefore, the description of the embodiments that follows for purposes of 
illustration and not limitation. 

[001 5] Fig. 1 illustrates an example of a computer system that may be used to execute the software of an embodiment 
of the invention. Fig. 1 shows a computer system 1 that includes a display 3, screen 5. cabinet 7, keyboard 9, and 
mouse 1 1 . Mouse 1 1 may have one or more buttons for interacting with a graphical user interface. Cabinet 7 houses a 

45 CD-ROM drive 13, system memory and a hard drive (see Fig. 2) which may be utilized to store and retrieve software 
programs incorporating computer code that implements the invention, data for use with the invention, and the like. 
Although a CD-ROM 15 is shown as an exemplary computer readable storage medium, other computer readable stor- 
age media including floppy disk, tape, flash memory, system memory, and hard drive may be utilized. Additionally, a 
data signal embodied in a carrier wave (e.g., in a network including the Internet) may be the computer readable storage 

50 medium. 

[0016] Fig. 2 shows a system block diagram of computer system 1 used to execute the software of an embodiment 
of the invention. As in Fig. 1 , computer system 1 Includes monitor 3 and keyboard 9. and mouse 1 1 . Computer system 
1 further includes subsystems such as a central processor 51 , system memory 53, fixed storage 56 (e.g., hard drive), 
removable storage 57 (e.g., CD-ROM drive), display adapter 59, sound card 61 . speakers 63, and network interface 65. 
55 Other computer systems suitable for use with the invention may include additional or fewer subsystems. For example, 
another computer system could include more than one processor 51 (/.e., a multi-processor system) or a cache mem- 
ory. 

[0017] The system bus architecture of computer system 1 is represented by arrows 67. However, these arrows are 
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illustrative of any interconnection scheme serving to link the subsystems. For example, a local bus could be utilized to 
connect the central processor to the system memory and display adapter. Computer system 1 shown in Fig. 2 is but an 
example of a computer system suitable for use with the invention. Other computer architectures having different config- 
urations of subsystems may also be utilized. 

s [0018] The present invention provides methods of aligning scanned images or image files of hybridized chips includ- 
ing nucleic acid probes. In a representative embodiment, the scanned image files include fluorescence data from a bio- 
logical array, but the files may also represent other data such as radioactive intensity, light scattering, refractive index, 
conductivity, electroluminescence, or large molecule detection data. Therefore, the present irrvention is not limited to 
analyzing fluorescence measurements of hybridization but may be readily utilized to analyze other measurements of 

10 hybridization. 

[001 9] For purposes of illustration, the present invention is described as being part of a conputer system that designs 
a chip masK syntiiesizes the probes on the chip, labels the nucleic acids, and scans the hybridized nucleic acid probes. 
Such a system is fully described in U.S. Patent No. 5,571 ,639 that has been incorporated by reference for all purposes. 
However, the present invention may be used separately from the overall system for analyzing data generated by such 
15 systenrs. 

[0020] Fig. 3 illustrates a computerized system for forming and analyzing an-ays of biological materials such as RNA 
or DNA. A computer 1 00 is used to design arrays of biological polymers such as RNA and DNA. The computer 1 00 may 
be, for exanple, an appropriately programmed Sun Workstation or personal computer or workstation, such as an IBM 
PC equivalent, including appropriate memory and a CPU as shown in Figs. 1 and 2. The computer system 100 obtains 
2G inputs from a user regarding characteristics of a gene of interest, and other inputs regarding the desired features of the 
array. Optionally, the computer system may obtain information regarding a specific genetic sequence of interest from an 
external or internal database 102 such as GenBank. The output of the computer system 100 is a set of chip design 
conputer files 104 in the form of. for example, a switch matrix, as described in PCT application WO 92/10092, and other 
associated computer files. 

25 [0021 ] The chip design files are provided to a system 1 06 that designs the lithographic masks used in the fabrication 
of arrays of molecules such as DNA. The system or process 106 may include the hardware necessary to manufacture 
masks 110 and also the necessary computer hardware and software 108 necessary to lay the mask patterns out on the 
mask in an efficient manner. As with the other features in Fig. 3. such equipment may or may not be located at the same 
physical site but is shown together for ease of illustration in Fig. 3. The system 106 generates masks 1 10 or other syn- 

30 thesis patterns such as chrome-on-glass masks for use in the fabrication of polymer arrays. 

[0022] The masks 1 10. as well as selected information relating to the design of the chips from system 100. are used 
in a synthesis system 112. Synthesis system 112 includes the necessary hardware and software used to fabricate 
arrays of polymers on a siibstrate or chip 114. For example, synthesizer 112 includes a light source 1 16 and a chemical 
flow cell 1 18 on which the substrate or chip 1 14 is placed. Mask 1 10 is placed between the light source and the sub- 

35 strate/chip. and the two are translated relative to each other at appropriate times for deprotection of selected regions of 
the chip. Selected chemical regents are directed through flow cell 118 for coupling to deprotected regions, as well as 
for washing and other operations. All operations are preferably directed by an appropriately programmed computer 1 19. 
which may or may not be the same computer as the computer(s) used in mask design and mask making. 
[0023] The substrates fabricated by synthesis system 112 are optionally diced into smaller chips and exposed to 

40 marked targets. The targets may or may not be complementary to one or more of the molecules on the substrate. The 
targets are marked with a label such as a fluorescein label (indicated by an asterisk in Fig. 3) and placed in scanning 
system 1 20. Scanning system 1 20 again operates under the direction of an appropriately programmed digital computer 
122, which also may or may not be the same computer as the computers used in synthesis, mask making, and mask 
design. The scanner 120 includes a detection device 124 such as a confocal microscope or CCD (charge-coupled 

45 device) that is used to detect the location where labeled target (*) has bound to the substrate. The output of scanner 
120 is an image file(s) 124 indicating, in the case of fluorescein labeled target, the fluorescence intensity (photon counts 
or other related measurements, such as voltage) as a function of position on tiie substrate. Since higher photon counts 
will be observed where the labeled target has bound more strongly to the array of polymers (e.g., DNA probes on the 
substrate), and since the monomer sequence of the polymers on the substrate is known as a function of position, it 

50 becomes possible to determine the sequence(s) of polymer(s) on the substrate that are complementary to the target. 
[0024] The image file 124 is provided as input to an analysis system 126 that incorporates the scanned image align- 
ment techniques of the preserrt invention. Again, the analysis system may be any one of a wide variety of computer sys- 
tem(s), but in a preferred embodiment the analysis system is based on a WINDOWS NT workstation or equivalent. The 
analysis system may analyze the image file(s) to generate appropriate output 128, such as the identity of specific muta- 

55 tions in a target such as DNA or RNA. 

[0025] Fig. 4 is a high level flowchart of a process of synthesizing a chip. At a step 201 , the desired chip characteristics 
are input to the chip synthesis system. The chip characteristics may include (such as sequence checking systems) the 
genetic sequence(s) or targets tiiat would be of interest. The sequences of interest may. for example, identify a virus. 
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10 



15 



microorganism or individual. Additionally, the sequence of interest may provide information about genetic diseases, 
cancers or infectious diseases. Sequence selection may be provided via manual input of text files or may be from exter- 
nal sources such as GenBank. In a preferred embodiment that performs de novo sequencing of XargeX nucleic acids, 
this steps is not necessary as the chip includes all the possible n-mer probes (where n represents the length of the 
nucleic acid probe). 

[0026] For de novo sequencing, a chip may be synthesized to include cells containing all the possible probes of a 
specific length. For example, a chip may be synthesized that includes all the possible 8-mer DNA probes. Such a chip 
would have 65,536 cells (4*4*4*4*4*4*4*4), with each cell corresponding to a particular probe. A chip may also include 
other probes including all the probes of other lengths. 

[0027] At a step 203 the system determines which probes would be desirable on the chip, and provides an appropriate 
"layout" on the chip for the probes. The layout implements desired characteristics such as an arrangement on the chip 
that permits "reading" of genetic sequence and/or minimization of edge effects, ease of synthesis, and the like. 
[0028] The masks for the chip synthesis are designed at a step 205. The masks are designed according to the desired 
chip characteristics and layout. At a step 207, the system synthesizes the DNA or other polymer chips. Software con- 
trols, among other things, the relative translation of the substrate and mask, the flow of the desired reagents through a 
flow cell, the synthesis temperature of the flow cell, and other parameters. 

[0029] Fig. 5 illustrates the binding of a particular target DNA to an array of DNA probes 114. As shown in this simple 
example, the following probes are formed in the array: 



20 



3 • -AGAACGT 



25 



AGACCGT 
AGAGCGT 
AGATCGT 



30 



35 



As shown, when the f luorescein-labeled (or otherwise marked) target 5'-TCTTGCA is exposed to the array, it is comple- 
mentary only to the probe 3'-AGAACGT, and fluorescein will be primarily found on the surface of the chip where 3'- 
AGAACGT is located. The chip contains cells that include multiple copies of a particular probe. Thus, the image file will 
contain fluorescence intensities, one for each probe (or cell). By analyzing the fluorescence intensities associated with 
a specific probe, it becomes possible to extract sequence information from such arrays using the methods of the inven- 
tion disclosed herein. 

[0030] For ease of reference, one may call bases by assigning the bases the following codes: 



45 



50 



55 



Code 


Group 


Meaning 


A 


A 


Adenine 


C 


C 


Cytosine 


G 


G 


Guanine 


T 


T(U) 


Thymine (Uracil) 


M 


Aor C 


aMino 


R 


Aor G 


puRine 
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(continued) 



Code 


Group 


IVI wul III IM 


W 


A nr T^U^ 

Ul t \\J J 


VVCCIfX IIIICICIWUUII \^ It IJUIIUO^ 


Y 


C or T/LH 


nYri mirii n p 

1 1 II 1 IIWII IC 


s 


C or G 




K 


G or T(U) 


Keto 


V 


A, C or G 


not T{U) 


H 


A. C or T(U) 


notG 


D 


A, G or T{U) 


note 


B 


C. G or T(U) 


not A 


N 


A, C, G, or T(U) 


Insufficient intensity to call 


X 


A, C. G, or T(U) 


Insufficient discrimination to call 



Most of the codes conform to the lUPAC standard. However, code N has been redefined and code X has been added. 

20 

Scanned Image Alignment 

[0031] Before the scanned image alignment of the invention are discussed, it may be helpful to provide an overview 
of the overall process in one embodiment. Fig. 6 illustrates a flowchart of a process of how a chip is hybridized and ana- 
25 lyzed to produce experimental results. A chip 251 having attached nucleic acid sequences (or probes) is combined with 
a sample nucleic acid sequence {e.g., labeled fragments of the sample) and reagents in a hybridization step 255, The 
hybridization step produces a hybridized chip 257. 

[0032] The hybridized chip is scanned at a step 259. For example, the hybridized chip may be laser scanned to detect 
where fluorescein-labeled sample fragments have hybridized to the chip. Numerous techniques may be utilized to label 
30 the sample fragments and the scanning process will typically be performed according to the type of label utilized. The 
scanning step produces a digital image of the chip. 

[0033] In preferred embodiments, the scanned image of the chip includes varying fluorescent intensities that corre- 
spond to the hybridization intensity or affinity of the sample to the probes in a cell. In order to achieve more accurate 
results, it is beneficial to identify the pixels that belong to each cell on the chip. At an image alignment step 263. the 
35 scanned image is aligned so that the pixels that correspond to each cell can be identrfied. Optionally, the image align- 
ment step includes the alignment of a grid over the scanned image {see Rg. 7B). 

[0034] At a step 267, the analysis system analyzes the scanned image to calculate the relative hybridization intensi- 
ties for each cell of interest on the chip. For example, the hybridization intensity for a cell, and therefore the relative 
hybridization affinity between the probe of the cell and the sample sequence, may be calculated as the mean of the pixel 
40 values within the cell. The pixel values may correspond to photon counts from the labeled hybridized sample fragments. 
[0035] The cell intensities may be stored as a cell intensity file 269. In preferred embodiments, the cell intensity file 
includes a list of cell intensities for the cells. At an analysis step 271, the analysis system may analyze the cell intensity 
file and chip characteristics to generate results 273. The chip characteristics may be utilized to identify the probes that 
have been synthesized at each cell on the chip. By analyzing both the sequence of the probes and their hybridization 
45 intensities from the cell intensity file, the system is able to extract sequence information such as the location of muta- 
tions, deletions or insertions, or the sequence of the sample nucleic acid. Accordingly, the results may include sequence 
information, graphs of the hybridization intensities of probe(s), graphs of the differences between sequences, and the 
like. See U.S. Patent Application No. 08/327,525, which is hereby incorporated by reference for all purposes. 
[0036] In order to align the scanned image, the invention provides a pattern in the scanned image that will be con- 
so votved into a recognizable pattern. In preferred embodiments, the pattern in the scanned image is a checkerboard pat- 
tern that is generated by synthesizing alternating cells that include probes that are complementary to a control nucleic 
acid sequence. The control nucleic acid sequence may be a known sequence that is labeled and hybridized to the chip 
for the purpose of aligning the scanned image. Additionally, the brightness of the cells complementary to the control 
nucleic acid sequence may be utilized as a baseline or for comparison to other intensities. 
55 [0037] As an example. Fig. 7A shows a checkertK)ard pattern in a hybridized chip. A scanned image 301 of a hybrid- 
ized chip include an active area 303 where the probes were synthesized. At the corner of the active area is a pattern 
305 that is a checkerboard pattern. Typically, the pattern appears at each corner of the active area of the scanned 
image. Attiiough the pattern is shown as being a checkertx>ard pattern, in otiier embodiments the pattern is a circle, 
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square, plus sign, or any other pattern. 

[0038] With regard to Fig. 6, it was stated that a grid may optionally be placed over the scanned image to show or 
delineate the individual cells of the chip. Fig. 7B shows a grid that has been aligned over the scanned image of Fig. 7A 
to show the individual cells of the chip. As shown, a grid 307 has been placed over active area 303 of hybridized chip 
5 301. 

[0039] Fig. 8 illustrates a flowchart of a process of image alignment. The flowchart shows detail for step 263 of Fig. 
6. At a step 351. the scanned image is convolved with a filter. The filter is typically a software filter that convolves the 
scanned image into a convolved image. When the scanned image is convolved, a pattern in the scanned image is con- 
volved into a recognizable pattern. The position of the recognizable pattern in the convolved image may be utilized to 

10 align the scanned image, such as by placing a grid over the image. 

[0040] At a step 353, the convolved image is searched for bright areas. When the scanned image is convolved, the 
pattern(s) in the scanned image will be convolved into a recognizable pattern or patterns of bright areas. Accordingly, 
once bright areas are identified in the convolved image, the system confirms that the bright areas are in the expected 
recognizable pattern {e.g., a grid pattern) at a step 355. 

15 [0041] In order to better understand what is meant by the different patterns, Fig. 9A shows a checkerboard pattern 
401 in a scanned image 403. Fig. 9B shows a recognizable pattern 451 in convolved image 453. The convolved image 
was generated from the scanned image of Fig. 9A. As shown, recognizable pattern 41 in this embodiment is a grid pat- 
tern that was generated by the checkerboard pattern when it was convolved with a filter. Additionally, it should be noted 
that the filter acted to remove the other pixel intensities so that the convolved image only includes the recognizable pat- 

20 tern. By removing pixel intensitiespixel intensities that are not part of the pattern in the scanned image, it is easier to 
align the scanned image. 

[0042] Fig. 1 0 illustrates a flowchart of a process of convolving the scanned image. The flowchart illustrates a process 
that may be performed at step 351 of Fig. 8. At a step 501, a pixel is selected. For simplicity, we will assume that the 
process selects pixels of the scanned image from left to right and top to bottom. Of course, the order that the pixels are 
25 analyzed may be varied. 

[0043] Once a pixel selected, neighbor pixels may then be selected at a step 503. By neighbor pixels, it is meant pixels 
that the pixels are near, but not necessarily adjacent to a pixel. For example, Fig. 1 1 shows neighbor pixels that may be 
analyzed to produce a convolved pixel in a convolved image. As shown in Fig, 11, there are 9 pixels labeled 1-9. In a 
preferred embodiment, pixel 1 is the pixel retrieved at step 501 and the neighbor pixels retrieved at step 503 are pixels 

30 2-9. Of course, any number or location of different neighbor pixels may be utilized. 

[0044] At a step 505, the average of the odd pixels and the average of the even pixels is determined. Referring again 
to Fig. 1 1 , the intensities of pixels 1,3,5. 7, and 9 may be averaged to produce the average of the odd pixels (AVGq). 
Similarly, the intensities of pixels 2, 4, 6, and 8 may be averaged to produce the average of the odd pixels (AVGe). Thus, 
the odd pixels may be pixels that have an odd number designation and the even pixels may be pixels that have an even 

35 number designation. 

[0045] Pixel 1 is convolved into a convolved pixel in a convolved image by determining if the average of the odd pixels 
is greater than the average of the even pixels at a step 507. If the average of the odd pixels is greater, the convolved 
pixel is set equal to the intensity of the minimum of the odd pixels minus the intensity of the maximum of the even pixels 
at a step 509. Otherwise, the convolved pixel is set equal to the intensity of the minimum of the even pixels minus the 

40 intensity of the maximum of the odd pixels at a step 51 1 . 

[0046] Conceptually, the neighbor pixels may be thought of as being filtered, such as by a software filter in preferred 
embodiments. With the filter, the system is searching for a checkerboard pattern where all the odd pixels are either 
darker or lighter than the even pixels. Accordingly, averages of the odd and even pixels are calculated at step 505. Step 
507 acts to determine if the pixels likely reflect a checkerboard pattern where the odd pixels, and therefore squares, are 

45 light {e.g., high intensity) or dark {e.g., low intensity), if the odd pixels likely reflect a checkertoard pattern where the 
odd pixels are light, step 509 sets the convolved pixel to the difference between selected odd and even pixels, where 
the selected odd pixel is the minimum of the odd pixels and the selected even pixel is the maximum of the even pixels. 
Step 51 1 is simitar but reversed. 

[0047] Therefore, at step 509. if all the odd pixels are much brighter than ail the even pixels, the difference will be a 
so larger value. Hence, the convolved pixel will be relatively bright {e.g., high intensity). The convolved pixel will also be 
relatively bright if all the even pixels are much brighter than all the odd pixels at step 511. However, if the difference at 
step 509 or 51 1 is very small (or negative), the convolved pixel will be set to a relatively dark intensity. Convolved pixels 
with negative pixel values may be set to a zero in preferred embodiments, in short, if the filter finds a checkerboard pat- 
tern, the convolved pixel will be bright and if the filter finds a relatively random pattern, the convolved pixel will be dark 
55 (thus, filtering out "noise" that is not the desired pattern). 

[0048] The recognizable pattern in Fig. 9B, which is a grid pattern, was generated by the software filter of Fig. 10. In 
order to better see how the recognizable pattern was generated, Figs. 12A-D show how the filter may be moved over 
the checkertxjard to produce a grid pattern in the convolved image. As the filter is convolved over the pattern in the 



BNSDOCID: <EP 0923050A2J_> 



EP 0 923 050 A2 



scanned irnage shown in a square 530 in Fig. 12A. a bright square will be generated in the convolved image since a 
checkerboard pattern will be found. Similarly, a bright square will be generated in the convolved image when the filter 
is over the pattern in square 530 ot Fig. 12B. Of course, the checkeiboard patterns in square 530 of Figs. 12A and 12B 
are reversed, but both will produce a bright square in the convolved image as described above in reference to Fig. 10. 
5 Figs. 12C and 12D will also produce two bright squares. Therefore, a 2x2 bright square grid pattern is generated as 
shown in Fig. 9B. 

[0049] Additionally, as the software filter of Fig. 1 0 acts to filter out signals that are not the desired pattern, the recog- 
nizable pattern {e.g,, a grid pattern) is easier to identify. The recognizable patterns in the convolved image are utilized 
to align the scanned image. Returning now to Fig. 10, after a selected pixel is convolved into a convolved pixel by ttie 
10 filter, it is determined if there is another pixel to process in the scanned image at a step 513. 

[0050] The following shows how well an embodiment of the invention aligned scanned images of hybridized chips: 
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25 The previous method was to analyze the scanned image (unfiltered) to locate bright areas or spots in a checkerboard 
pattern. As shown, an embodiment of the invention was able to dramatically increase the accuracy of scanned image 
alignment. 

Refined Grid Alignment 

30 

[0051] in preferred embodiments, refined image alignment may be performed to further increase the accuracy of the 
scanned image alignment. Fig. 13 illustrates a flowchart of a process of refining grid alignment over a scanned image. 
Thus, for example, once the above-described process has been performed to align the scanned image, the process in 
Fig. 13 may be utilized to refine the alignment. 
35 [0052] At a step 551 , pixel intensities on grid lines in the grid are summed. For example, the intensities of the grid in 
a vertical direction in the checkerboard pattern in the scanned image may be summed. Fig. 14 shows the grid lines in 
the scanned image that may be analyzed to refine the grid alignment. As shown, the pixel intensities of vertical lines 
601 of a checkerboard pattern 603 may be summed and stored. 

[0053] Then, at a step 553, the system may determine if there are more positions of the grid to analyze. If there are, 
40 the position of the grid may be adjusted at a step 555. Therefore, the grid may be moved left and right by one or more 
pixels before the intensities are summed along grid lines at step 551 . Once all the positions of the grid have been ana- 
lyzed, the system selects a grid position where pixel intensities (e.^., the sum calculated at step 551) are at a minimum. 
Therefore, if the pixel intensities for grid lines are lower at another position, the grid is adjusted accordingly. This refine- 
ment will work well if the cells are typically separated by a darker area or line. 
45 [0054] Although the process in Fig. 13 was described for grid lines in the vertical direction, preferred embodiments 
also perform the same grid alignment for the horizontal direction. The distance that the grid is able to be moved for 
refinement may be limited. For example, the grid may be limited to movement of one-third a cell size. 
[0055] The following shows how well an embodiment of the invention aligned scanned images of hybridized chips uti- 
lizing the refined grid alignment: 
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(continued) 
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w Once again, the previous method was to analyze the scanned image (unfiltered) to locate bright areas or spots in a 
checkerboard pattern. As shown, an embodiment of the invention was able to dramatically increase the accuracy of 
scanned image alignment. Furthermore, refining grid alignment increased the percentage of scanned images that were 
perfectly aligned with the invention from 4% to 64%. Therefore, performing a refinement of grid alignment can signifi- 
cantly increase the accuracy of the grid alignment. 

15 

Con glu gi on 

[0056] While the above is a complete description of preferred embodimerrts of the invention, various alternatives, 
modifications, and equivalents may be used. It should be evident that the invention is equally applicable by making 
20 appropriate modifications to the embodiments described above. For example, the invention has been described in ref- 
erence to a checkerboard pattern in the scanned image. However, the invention is not limited to any one pattern and 
may be advantageously applied to other patterns including those described herein. Therefore, the above description 
should not be taken as limiting the scope of the invention that is defined by the metes and bounds of the appended 
claims along with their full scope of equivalents. 
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Software listing of the algorithm: 

//////////////////////////////////////////////////////////////////////////// 
// CheckerFilt 
// purpose 

// perform a checker-board Xemcl filter on the image. 
// input 

// cciawidth, cellHeight, size of the cell 

// *iing, the #of rows and columns in the image and the image data 
// output 

// *iaig, the i m age is filtered in place 



void Checkerrilt tint cellWidth, int cellHeight, IMAGE ♦img) 
{ 

int row, col, rowBegin, nRows, nCols, colBcgin, rowEndFiltcr, colEndFilter, imgOf f set; 
int oddAvg, evenAvg, oddKin . oddMax , evenMin, evenHax; 
int temp; 

PIX_T *el-iTOLL, ♦e2-NDLI-, *e3-NULL, •e4-NULI,, *e5-NULL, •e6-NULL, 'eT-NULL, • eB-NULL, * eS-NXH-L; 
// 

// Determine the range of rows and columns to filter 
20 rowBegin— D; 

colBegin-O; 
nRows-img- >rows ; 
nCols-img->cols; 

rowEndrilter-'nRows-l-2*cellHeight; 
colEndrilter-JiCols-l-2*cellWidth; 

25 

//Tor each row 

for (row-rowBegin; row<-rowEndrilter; row++) 
( 

//Initialize the filter's pointers 
// el e2 e3 
// e4 e5 e6 
// e7 ee e9 
// 

Set3x3Pointers (img, row, cellKidth, cellHeight, tel , 4e2 , 4e3 , ie4 , 6e5, Ae6, 6e7 , *ee , &e9) ; 

// walk the row, doing the filter 
f or (col-colBegin; col<-colEndFilter ; col+-»') 
{ 

// Avgl - Average pixels 1, 3, 5, 7, 9 
// Avg2 - Average pixels 2, 4, 6, B 

oddAvg - {el [col) + e3fcol) + eSIcol] + e7[col3 + e9[col))/5; 
40 evenAvg «- (e2tcol] + e< (col) + e6tcol] + eetcol3)/4; 

// If avgOdd > avgEven 

// Then the area is bright and 

// NewPixel - min (vl , v3, v5, v7 , v9) - max (v2 , v4 , v6, vB) 
// Else the area is dark and 
45 // NewPixel - min (v2, v4 , v6, vS ) - max (vl , v3, v5, v7 , v9) 

// 

if { OddAvg > evenAvg) 
{ 

oddKin-MIH(eltcol),KIN(e3[col], MIN{e5[col], MIN(e7(col3, eSlcol])))); 
evenWax - MAX Ce2 (col] , MAX (e4 {col) , MAX (e€ I col), eSlcol)))); 
el (col)- MAX (0, oddMin-evenMax) ; 
temp-el [col ] ; 
if (temp >0) 
temp-el [col] ; 

55 
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else 
{ 
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25 
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40 



evcnMin - MIN (e2 [coll , MIN (e4 Icol] , MIN{e6[col3, e8tcol)))); 
oddMax-MAX(el(col),MAX(e3rcol], MAX(e5Ccol], MAX(e7[colJ, eSlcol))))); 
ellcol] - KMC (Cr, evenMin - oddMax) ; 
teitip'»e 1 1 col 3 ; 
if(tcinp >0) 
tftjap-el [col] r 



} 

) 

. //Set the border pixels, which are not filtered, to zero. 
2or(row-0;row<nKows; rcw++) 

{ 

imgOff set-row* (iing->cols ) ; 
e 1 - img - > ima g e +iiDgO£ f s e t ; 
if (row<rowEndFilter ) 

colBegin-colEndFilter; 
else 

colBegin-0; 
for (col-colBegin; col<nCols;col++l 

el I col] -0; 

) 

return; 

} 

///////////////////////////////////////////////////////////////////////////// 
// Set3x3Pointers 
// purpose 

// intialiie pointers that will be used when walking the kernel along 
// a row of image data. 
// input 

// •img: image struct contains number of rows and columns in the image 

// row: the row of the image on which we are applying the kernel 

// cellwidth, cellHeight: si2e of the cell which implies the sire of the kernel 

// 

// output 

// el..e9: pointers to the 9 pixels that will be used for kernel calculations 
// 

void Set3x3Pointers (IMAGZ •lmg,int row, int cellwidth, int cellHeight, 
PIX_T ♦*el,PIX_T *»e2,PIX_T ••c3,PIX_T ••e4,PIX_T ••c5,PIX_T ••e6. 



{ 



PIX_T •*e7,PIX_T **eB,PIX_T *'e9) 

PIX_T •pl-NXJLL, •p2-N0LL, *p3-NDLL; 
int imgOffset; 

int cellWidthTimes2-cellWidth*2; 
int nCols-iing->cols ; 

iagOf f set""row* (img->cols) ; 

pl-img->image+imgOffset; 

p2«^l+nCols-cellHeight; 

p3-pl+nCols*2*cellHeight; 

•el - pi; •e2 - pl+cellKidth; *e3 - pl+cellWidthTimes2; /* SET THE POINTERS FOR THE 3 ROWS •/ 
♦e4 - p2; *eS - p2-^cellKidth; *e6 - p2+cellWidthTimes2 ; /• (WHOSE POINTERS ROTATE) •/ 
•e7 - p3; *e8 - p3+cellwidth; 'eS - p3+c€llWidthTiines2 ; 

) 
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convolving a scanned image with a filter, the scanned image including a first pattern that the filter will convolve 
into a second pattern in a convolved image: and 

aligning the scanned image according to a position of the second pattern in the convolved image. 

5 2. The method of claim 1 , wherein convolving a scanned image with a filter comprises setting a convolved pixel to a 
difference between a selected odd pixel and a selected even pixel of the first pattern. 

3. The method of claim 2, wherein the selected odd pixel has the lowest intensity of the odd pixels and the selected 
even pixel has the highest intensity of the even pixels, if the average intensity of the odd pixels is greater than the 

10 average intensity of the even pixels. 

4. The method of claim 2, wherein the selected odd pixel has the highest intensity of the odd pixels and the selected 
even pixel has the lowest intensity of the even pixels, if the average intensity of the odd pixels is not greater than 
the average intensity of the even pixels. 

15 

5. The method of claim 1 , wherein the first pattern is a checkerboard pattern. 

6. The method of claim 1 . wherein the second pattern is a grid pattern. 

20 7. The method of claim 1 , wherein aligning the scanned image comprises aligning a grid over the scanned image. 

8. The method of claim 7, further comprising adjusting the position of the grid to minimize a sum of the intensities of 
pixels along a direction in the grid. 

25 9. The method of claim 1 , wherein the scanned image includes multiple copies of the first pattern. 

1 0. The method of claim 9, wherein the scanned image is a rectangle with a copy of the first pattern near each corner. 

1 1 . A computer program product that aligns scanned images, comprising: 

30 

computer code that convolves a scanned image with a filter, the scanned image including a first pattern that 
the filter will convolve into a second pattern in a convolved image; 

computer code that aligns the scanned image according to a position of the second pattern in the convolved 
image; and 

35 a computer readable medium that stores the computer codes. 

12. A method of aligning scanned images, comprising: 

synthesizing a chip having attached nucleic acid sequences, the chip including a first pattern of nucleic acid 
40 sequences; 

hybridizing labeled nucleic acid sequences to nucleic acid sequences on the chip; 
scanning the hybridized chip to produce a scanned image; 

convolving the scanned image with a fitter, the filter convolving the first pattern into a second pattern in a con- 
volved image; and 

45 aligning the scanned image according to a position of the second pattern in the convolved image. 

13. The method of claim 12, wherein convolving the scanned image with a filter comprises setting a convolved pixel to 
a difference between a selected odd pixel and a selected even pixel of the first pattern. 

50 14. The method of claim 13, wherein the selected odd pixel has the lowest intensity of the odd pixels and the selected 
even pixel has the highest intensity of the even pixels, if the average intensity of the odd pixels is greater than the 
average intensity of the even pixels. 

1 5. The method of claim 13. wherein the selected odd pixel has the highest intensity of the odd pixels and the selected 
55 even pixel has the lowest intensity of the even pixels, if the average intensity of the odd pixels is not greater than 

the average intensity of the even pixels. 

16. The method of claim 12, wherein the first pattern is a checkertx)ard pattern. 



12 



BNSDOCtD: <EP 0923050A2_I_> 



EP 0 923 050 A2 

17. The method of claim 16, wherein the labeled nucleic acid sequences include control nucleic acid sequences that 
hybridize to alternating squares in the checkerboard pattern. 

18. The method of claim 12, wherein the second pattern is a grid pattern. 

19. The method of claim 12. wherein aligning the scanned image comprises aligning a grid over the scanned image. 

20. The method of claim 19, further comprising adjusting the position of the grid to minimize a sum of the intensities of 
pixels along a direction in the grid. 



w 



21. The method of claim 12, wherein the scanned image includes multiple copies of the first pattern. 

22. The method of claim 21 , wherein the scanned image is a rectangle with a copy of the first pattern near each corner. 

15 23. A computer program product that aligns scanned images, comprising: 

computer code that receives as input a scanned image of a chip having attached nucleic acid sequences to 
which labeled nucleic acid sequences are hybridized, the chip including a first pattern of nucleic acid 
sequences; 

20 computer code that convolves the scanned image with a filter, the filter convolving the first pattern into a sec- 

ond pattern in a convolved image; 

computer code that aligns the scanned image according to a position of the second pattern in the convolved 
image; and 

a computer readable medium that stores the computer codes. 



26 



24. A chip, comprising: 



a plurality of polymers attached to the chip in a first pattern so that the first pattern can be convolved into a sec- 
ond pattern to align the chip for scanning. 

30 

25. The chip of claim 24, wherein the first pattern is a checkerboard pattern. 

26. The chip of claim 24, wherein the second pattern is a grid pattern. 

35 27. The chip of claim 24, wherein the chip is a rectangle with the plurality of polymers attached to the chip in a first pat- 
tern near each corner. 

28. The chip of claim 24, wherein the plurality of polymers are nucleic acid sequences. 

40 
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